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LimEODucim 

Consider  the  following  adaptive  control  problem  introduced  by  Rishel  [7]. 

Let  xe  R"  be  an  unknown  parameter.  We  consider  a  Bayesian  set  up, 
where  x  is  distributed  according  to  some  prior  density  po(x),  which  may  or  may 
not  be  compactly  supported.  Let  wt  be  an  m  dimensional  Brownian  motion,  and 
define  yt  by 


yt  =  yo  +  '^t 


(1.1) 


Let  Ft  be  the  sigma  field  generated  by  {ys,  0  <  s  <  t}  and  Gt  be  the  sigma  field 
generated  by  ({ys,  0  <  s  <  t}  v  x).  Let  U  be  a  compact  subset  of  R^.  Let  A(x,y)  e 
R"^  andB(x)€  R*”xq  be  matrices  depending  on  the  unknown  parameter  x  and  on 
y.  Conditions  on  A(x,y)  and  B(x,y)  will  be  imposed  below. 

Define  an  admissible  control  u(t)  to  be  a  U  valued  stochastic  process 
satisfying: 

(a)  ut  is  Ft  adapted 

T 

1  r  2 

(b)  E  exp  (—1  IIAfx.y^)  +  B(x)Ujll  dt)  <  o«>  Vx  €  supp  Pq(x) 

0 

In  particular,  if  A(x,y)  and  B(x)  are  bounded,  (b)  reduces  to  a  trivial  condition. 

The  set  of  admissible  controls  will  be  denoted  by  U.  Define: 


Ap=  exp 


J.  A 

J  (  A(x,yp  +  B(x)Ujj  dw^  -  y  J  1 1  A(x,yj)  +  B(x)Uj 


dt 


(1.2) 


where  *  denotes  the  operation  of  taking  transposes.  By  b),  E(At)  =  1,  and  we  may 
define  a  new  measure  P“  such  that  =  Aj,  under  which 

t 

yt  =  yo  +  J  (A(x,yj)  +  B(x)u^)ds  +  w“  (1.3) 

0 

where  w^  is  a  P“  Brownian  motion. 

The  adaptive  control  goal  is  to  minimize  over  the  class  of  admissible  control 
an  objective  cost  of  the  form 


3 


T 

J“=e“  Jr(y^,  u^,  s)ds 
0 

We  assume  throughout  that 


lf(y,u,t)!  <K(H-lyr) 


9/'(y,u,t) 

dy 


<K(l+lyf) 


(1.4) 


^/'(y.u,t) 

9u 


<  KCl+lyf) 


For  some  r  >  0. 

Note  that  (1.4)  is  a  partially  observed  stochastic  control  problem,  for  under 
P“  X  is  not  known  to  be  controller. 

In  [7],  Rishel  has  considered  a  version  of  the  problem  (1.3),  (1.4)  and 
proved  a  stochastic  maximum  principle.  Explicit  solutions  for  particular  cases  were 
derived  in  [1],  [2].  Hijab  [4]  considered  a  modified  version  of  (1.3)  where  under  a 
linearity  assumption  and  a  more  general  model  of  x,  he  found  an  explicit  solution  to 
a  problem  where  information  cost  is  attached  to  and  the  cost  is  a  function  of  x 
and  u.  Here,  we  exploit,  as  in  [1],  [2],  the  finite  dimensionality  of  the  estimation 
problem,  as  follows:  By  taking  conditional  expectations  in  (1.3),  one  has  [7]: 

dy^  =  E’'(A(x,y^)IF^)dt  +  E"(B(x)IF^)u^dt  +  dv" 

where  is  an  Ft  Brownian  motion.  In  the  sequel,  ^  will  denote  conditional 
expectations  w.r.t.  Ft,  i.e. 

A(x,yp^EV(x,ypiF^) 

For  simplicity  and  concreteness,  we  assume  below  that 


A(x,y)  =  A^y  +  A^(y)x 


(1.6) 
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B(x)  =  Bq  +  B  ^(x),  B  J\x)  =  ^  bj.  X,  i.e. 

k 

B(x)u  =  BqU  +  B^(u)x  where  bJ(u)  =  ^  u*"  ( 1 .7) 

m 

The  linearity  assumption  of  A(x,y)  wxt.  x  can  be  dispensed  of  if  one  has  a 
separation  of  variable  of  the  form  A(x,y)  =  Ai(y)g(x)  +  Ao(y);  similarly,  one  could 
include  a  y-dependence  in  B(x)  in  a  separation  of  variable  form.  Since  those 
extensions  are  easily  handled,  we  do  not  consider  them  here. 

Following  now  the  argument  of  Liptser-Shiryayev  [6,  ch.  12], 
appropriately  modified  to  our  case  due  to  the  non-Gaussian  assumptions  on  xq, 

(c.f.,  e.g.,  [3],  [9])  one  has  the  following: 

Lemma  2.1: 


p(xlF)  =■ 


Po(x)  1  , 

jj^expC-  (x-Y,)*a7"(x-Y,)) 

f  1  ^ 

ITm  (X-Yj)*a7i(x-Yj)) 


(1.8) 


where  N  (0,1)  = 


exp(— x*x) 

A  ^ 


(27C) 


n/2 


,  Yt  is  an  n-dimensional  vector  and  is  an  nxn  dimensional 


matrix  which  satisfy: 


dYj  =  a^[Aj(yp  +  B^(Uj)]  [dy^  -  (A^y^  +  B^u^  +  (Aj(y^)  +  B^(u^))Y^dt] 


(1.9a) 


Yn  =  0 


dttj  =  -ttjLA^Cy^)  +  Bj(Uj)]*[Aj(y^)  +  B^(Uj)]aj;  =  I  (1.9b) 

Proof:  Note  that  as  in  [1],  [3],  [9]  the  unnormalized  density  dzpx(zFt)  = 
Prob(x€  (z,dz)IFt)  •  Kt,  where  Kt  is  Ft  adapted,  is  of  the  form: 
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dzp^(zlFj)  =  E(Aj,  1 


XE(z,Z+dz) 


Pt) 


=  E(A  1 

I  xe(z,z+dz) 


IF,)  a 


dzpQ(z)  exp(~  J  z*[Aj(y^)  +  Bj(u^)]*[A^(yP  +  Bj(u^)]  zds 
0 


+  J  znA^(y^)  +  B^(u^)]=^(dy^  -  (A^y^  +  B^u  ^dt) 

0 

Dividing  and  multiplying  by  Nx(0,I),  one  obtains  (2.8).  We  remark  that  exactly  as 
in  [6],  at  is  positive  definite  for  0<t^.  □ 

Note  that  yt  can  be  now  rewritten  as: 


dyt  =  ( Vt  +  +  (Ai(yt)  +  B,(u^))F(y^,  a^)dt  +  d^ yo  =  0 


(1.9c) 


where 


F(Y,a)  = 


f  xPq(x) 

i  N  (0,1) 

X  X _ 

f  Po(x) 

I  Nx(0>l) 


N^(T,oc  i)dx 


N^(Y,a  ^)dx 


(1.10) 


where  Nx(Y,oc)  denotes  a  Gaussian  distribution  with  mean  y,  covariance  matrix 
a‘l.  Note  that  (1.9)  together  with  (1.4)  form  a  completely  observable  stochastic 
control  problem.  It  is  however  somewhat  a  complicated  one  due  to  the  degeneracity 
of  the  diffusion  matrix,  the  fact  that  control  enters  the  diffusion  matrix  and  the  non- 
Lipschitz  coefficients  of  (1.9). 

In  some  simple  cases  (and  specifically,  in  the  case  where  B(x)  =  B).  Benes 
and  Rishel  [1]  have  been  able  to  compute  explicitly  optimal  controls  via  the 
Hamilton-Jacobi-Bellman  equation  and  the  maximum  principle.  In  the  general 
case,  however,  the  Bellman  equation  does  not  seem  solvable  and  we  are  led  to 
consider  e-optimal  approximations. 

Remark.  Note  that  if  po(x)  =  Nx(0,I),  F(Y,a)  =  y. 
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2.  e-OPTIMAL  RANDOMIZED  MARKOV  STRATEGIES. 

In  this  section,  we  construct  E-optimal  randomized  Markov  strategies  for  the 
problem  posed  in  section  1,  i.e.  for  (1,9)  and  (1.4).  Those  strategies  are  defined 
in  terms  of  a  classical  solution  of  an  associated  Bellman  equation.  For  simplicity, 
we  make  the  following  stmctural  restrictions.  Those  restrictions  are  not  crucial  and 
could  be  avoided  at  the  expense  of  more  cumbersome  expressions  and  proofs. 
Additional  restrictions  of  more  technical  nature  (boundedness  etc.)  will  be  imposed 
later  (c.f.  lemma  2.1). 

Assumptions. 

Ao  =  Bo  =  0  (2.1a) 

po(x)  ~Nx(0,I)  (2.1b) 

We  will  seek  to  apply  the  method  of  [5,  ch.  5].  To  do  that,  it  will  however 
be  convenient  to  rewrite  (1.9)  in  a  different  way;  Let  Pt  =  oCj  Vt 

dttj  =  -aj[A^(yj)  +  Bj(Uj)]*[A^(yj)  +  B^(u^)]aj;  =  I  (2.2a) 

dPj  =  +[A^(y^)  +  B^(up]*[A^(yP  +  B^(u^)]a^p^dt  +  [A^(y^)  +  B^(u^)]»d9^ 

p0  =  0  (2.2b) 


dy^  =  +[\(yi)  +  Bj(Uj)]ajp^dt  +  d9  ^ 


(2.2c) 


In  the  sequel,  b  will  denote  the  drift  vector  in  (2.2)  and  o  wiU  denote  the  diffusion 
matrix  there.  Note  that  (2.2)  does  not  satisfy  the  conditions  of  [5,  ch.  5]  and 
therefore  the  methods  described  there  have  to  be  modified  to  be  applicable. 

Note  that  (2,2)  is  locally  Lipshitz;  however,  we  will  need  global  Lipschitz 
conditions.  Towards  this  end,  let 

=  { peR  ”  ,  yeR""!  ipi  < R,  lyl  <  R) 
x’^  =  {inft>OI  (pj,y^)e8a^} 


For  g  denoting  either  b(a,  P,  y)  or  a(a,  P,  y),  let 
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g’^(a,p,y)  =  g(a,  ^  (ipi  AR),  ^i\y\  AR) 

Let  now  a^,  denote  the  solution  to  (2.2)  when  bR,  is  substituted 

instead  of  b,  a.  Finally,  let  denote  1“  in  (2.4)  with  yR  substituted  instead  of  y. 

We  claim 

Lemma  2.1.  Assume  IAi(y)l  <  K,  IBi(u)l  <  K  for  u  eU.  Then 

lj“  -  -4  0  uniformly  on  all  admissible  strategies. 

R— 


Proof.  Note  first  that  by  the  boundedness  of  IAi(y)l,  IBi(u)l  and  of  xt ,  one 
has  by  standard  arguments  that  <  oo,  where  stands  for  either  Pt,  yt,  P^, 


.  Next,  we  note  that 


1 

J 


|f(y^,u^,s)  -  s)lds 


^TjjAT 


<  KjP(Xj^  <  T)  .  sup  f  E(ly/  +  lyf  r)ds 
yo.Po"^^  0 


Note  that  by  standard  estimates. 


sup  sup  E(lyjO^K  sup  sup  E(lp/)<K3R’’ 


(2.4) 


R 

with  a  similar  bound  on  E(ly  F).  On  the  other  hand. 


P(Xj^<T)^P(K4Sup  Vj>R) 
te[0,T] 


E(sup  V 

sj: 


R 


R 


(2.5) 


where  the  constants  Ki  -  Ke  do  not  depend  on  u.  Therefore, 


R  R— > 


(2.6) 
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and  the  lemma  is  proved.  □ 

In  view  of  the  lemma  above,  it  is  enough  to  build  e-optimal  strategies  for 
the  system  indexed  by  R,  for  R  large  enough.  We  will  attempt  to  do  that  by 
perturbing  (2.2a)  and  (2.3b)  with  an  auxilliary  Brownian  motion.  For  reasons  to 
become  clear  below,  we  will  not  perturb  (2.2c).  That  is  the  main  point  where  we 
depart  from  the  classical  treatment  [5].  Note  however  that  when  perturbing  (2.2a) 
such  that  at  is  no  longer  positive  definite,  b^(a,I3,y)  is  no  longer  Lipschitz 
continuous  and  moreover,  (3.2a)  may  have  a  finite  explosion  time.  To  remedy  that, 
we  modify  b^^  (a,  P,  y)  on  the  set  of  non-positive  a'-s;  This  will  not  affect  the 
solution  of  (2,2)  since  in  (2.2),  a  >  0  a.s. 

Let  b^(a,P,y)  =  b^(Pa,p,y)  where  Pa  denotes  the  projection  of  the 
symmetric  matrix  a  on  the  convex  set  { lal  <  R  A  a  >  0}.  Note  that  the  system 
(2.2)  with  (P^,  o^)  is  identical  to  the  system  with  (b^^,  o^),  and  that  (B^,cjl^)  are 
globally  Lipshitz.  Consider  the  following  perturbed  system: 


da^  =  b  ^  (u,  a^,  y^)dt  +  eldwj 

(2.7a) 

dpj  =  b  g(u,  a®,  P^,  y^)dt  +  cyg(u,y^)dO  ^  +  eldw^ 

(2.7b) 

dyj  =  b  y  (u,a^,P^,  y^)dt  +  d<>  ^ 

(2.7c) 

t  2 

where  Wj,  w^  are  independent  Brownian  motions  of  appropriate  dimensions.  Note 

that  (2.7)  is  uniformly  nondegenerate,  and  that  for  e=0  (2.7)  reduces  to  (2.2)  with 

R 

(b'f^,  a^)  instead  of  (b,a).  Let  (s,x)  denote  the  value  function  of  the  control 
problem  (2.7)  together  with  (2.4),  i.e. 

v^(s,x)  =  inf  J!(a^P^y^)  (2.8) 

^  ueU 

(“o-pQ-yo  )=x 


and  let  l|^  be  the  Backward  Kolmogorov  operator  associated  with  (2.7).  By  [5, 

R 

Thm.  4.7.7],  we  have  that  (s,x)  e  WL2(Cxjil)  for  each  R^  >  0.  Moreover,  by 
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[5,  Lemma  5.1.1],  for  each  5  >  0  and  Rj  >  0  there  exists  an  infinitely  differentiable 
feedback  strategy  u^’  (x^,t)  with  uniformly  bounded  spacial  first  derivatives  such 

that  ol^(u^’^(x£,t),  x)  and  K^(u^’^(x2,t),x)  are  uniformly  Lipshitz  continuous  and 
such  that 


e.8  e.8 

sup  IFEv’^l-fL^  v^  +  f 
t£(0,T)  ^  ^ 


(2.9) 


where 


F[u^]  =  sup  [L“  u^  +  /■'*] 


u  R 


ueU 


R 


R  R 

Finally,  note  that  again  by  [5,  Corollary  3.1,13],  V  (s,x)  -^v  (s,x)  uniformly  in 


Ct,R- 


We  can  state  now  our  main  result:: 

Theorem.  Assume  the  conditions  in  lemma  2.1.  Let 


(x,t)  =  u^’^(a+ewi,  |3+ew2,  y). 
Ri  Ri 


Then 


;e.S 


lim  lim  hm  ‘  =  inf  =  lim  v  (0,Xq) 


e-40  Rtoo  5i0 


ueU 


£->0 


i.e.,  one  can  construct  feedback  randomized  control  for  the  system  (2.2)  which  will 
be  as  close  as  desired  to  the  optimal  control. 

Proof.  The  proof  is  an  adaptation  to  the  proof  of  [5,  Thm.  5.2.5)].  Note  that 
under  u^’^  the  system  (2.2a)  (with  cr^  instead  of  (b,G))  is  transformed  into 

the  system: 


da^  =  B  ^(u|’^(a^,  y^),  -  ew„  P^-ew,,  y^)  dt  +  edw. 


a' 


(2.10a) 


dp^  =  b  p(u|’^(a^,  P^,  y^),  -  ew^  p^  —  ew^,  y^)dt  +  edw2 


+  a^(y^,  u|’®(a^,  P^  y^))d9  ^ 


(2.10b) 
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dy^  =  b  y(u|’^^(a^’^(a^  y®),  -  ew^  P^  -  ew2,  y^)  dt  +  dO 


(2.10c) 


Since  (2.10a)  is  uniformly  nondegenerate  and  v  (s,x)  e  Wl>2(CTji),  one  has 

£• 

from  the  weak  Ito  formula  ([5,  thm.  2.10.1]),  that,  for  each  (e,6,Ri)  =  T| 


v^(0,Xo)=E| 


%AT  ^ 
f  ^  dt 


J 
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R  Ri  4® 

+  V  (TAX  X  R  ) 
E  xVt 


J 


(2.11) 


Tr  at 


-bJ  [l^ 


58  ,.55 


V  +f  ‘]dt 
£ 


Td  at 


+  E  J  b  *^(u|’^^(s,x^),  x^) 


-b  (u|’®(s,x^),  x^  -  ew)]V  V  (s,x^)ds 

K.  A  p 


By  the  same  arguments  as  in  [5,  thm.  5.2.5],  the  first  terms  converge  to  ,  the  second  term 
converges  to  zero,  whereas  since  IVxVe(s,x2)l  <  K  (1+lx^lO,  one  has  also  the 
required  convergence  for  the  last  term.  The  theorem  is  proved.  □ 

Remarks:  1)  Note  that  the  main  difference  from  [5]  is  that  we  have  used  e- 
perturbation  only  in  some  of  the  components  of  the  diffusion.  Perturbing  all  the 
components  would  have  violated  the  stmctural  condition  under  which  one  may 
trade  controls  by  randomized  controls. 

2)  The  theorem  proved  allows  one  to  actually  build  e-optimal  control. 

Indeed,  for  a  given  e,  pick  up  5,R,Ri,e  such  that 

A  £.5 

<  mf  J  +e 

ueU 
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Such  a  choice  exists  due  to  the  theorem.  Then  u^’^  is  the  required  randomized 

Rl 

p  8 

feedback  control.  Note  that  u^’^,  which  is  the  feedback  function  needed  to  build 
u  is  obtained  via  a  classical  solution  of  the  Bellman  equation  associated  with  the 

Rl 

system  (2.7)-(2.8). 
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